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Abstract 

Independent component analysis (ICA) has become a popular multivariate analysis 
and signal processing technique with diverse applications. This paper is targeted at 
discussing theoretical large sample properties of ICA unmixing matrix functionals. We 
provide a formal definition of unmixing matrix functional and consider two popular 
estimators in detail: the family based on two scatter matrices with the independence 
property (e.g., FOBI estimator) and the family of deflation-based fastICA estimators. 
The limiting behavior of the corresponding estimates is discussed and the asymptotic 
normality of the deflation-based fastICA estimate is proven under general assumptions. 
Furthermore, properties of several performance indices commonly used for comparison 
of different unmixing matrix estimates are discussed and a new performance index is 
proposed. The proposed index fullfills three desirable features which promote its use in 
practice and distinguish it from others. Namely, the index possesses an easy interpreta- 
tion, is fast to compute and its asymptotic properties can be inferred from asymptotics 
of the unmixing matrix estimate. We illustrate the derived asymptotical results and 
the use of the proposed index with a small simulation study. 

Keywords: Independent Component Analysis; Performance Indices; FastICA; FOBI; 
Asymptotic Normality. 



1 Introduction 

In the independent component (IC) model we assume that the components of the observed 
p-variate random vector x = (xi,...,Xp)^ are linear combinations of the components of a 
latent p- vector z = {zi, Zp)"^ such that zi, ...,Zp are mutually independent. Then 

x = nz (1) 

*Pauliina Ilmonen is a Postdoc, Universite libre de Bruxelles, Belgium, (email: Pauli- 
ina. Ilmonen@gmail.com). Klaus Nordhausen is a University Lecturer, University of Tampere, Finland (email: 
Klaus.Nordhausen@uta.fi). Hannu Oja is an Academy Professor, University of Tampere, Finland (email: 
Hannu.Oja@uta.fi). Esa Ollila is a Academy Research Fellow, Aalto University School of Science and Tech- 
nology, Finland (email: esa.ollila@aalto.fi) 



1 



where is a full-rank px p mixing matrix. The model is semiparametric as we do not make 
any assumptions on the marginal distributions of zi, Zp. In order to be able to identify a 
mixing mat rix one has to assume t hat at most one of the components zi, ...,2p is normally 
distributed, iHyvarinen et al.l (l200l[ ). Still after this assumption, the parameter matrix VL is 
not uniquely defined: Let C be the set oi p x p matrices with exactly one non-zero element 
in each row and in each column. If C G C then also z* = Cz has independent components 
and the model can be rewritten as x = VL*z* where f2* = VtC^^ . 

There are several possible, but not always satisfactory, solutions to this identifiability 
problem. One then fixes C by fixing either z* = Cz or Vt* = QC~^ in some way. First, the 
random vector z* = Cz can be fixed by requiring, for example, that the components of z* 
satisfy (i) Var{z*) = 1, i = 1, ...,p, (ii) /3^{z*) > 0, i = 1, ...,p, and (iii) /32{zl) > ... > /32{z;) 
where (3i and (32 are classical m oment-based s k ewness and kurtosis measures, res pectively. 
The above idea was extended in Ilmonen et al. ( 2010a ): Nordhausen et aP (2011a) by fixing 
the random vector Cz using two different location vectors and two different scatter matrices 
with the so called independence property. In these approaches, some indeterminacy still 
remains for random vectors with identical or symmetrical marginal distributions, for example. 
Second, the transformation matrix Q* = QC~^ can be fixed by requiring, for example, that 
||i7C~^ — IpW is minimized. Illmonen and Paindaveind (120111 ) used a unique representation 
QC~^ such that all the diagonal elements of QC~^ are one. In this paper we accept the 
ambiguity in the model ([1]), and try to define our concepts and analysis tools so that they 
are independent of the model specification, that is, of the specific choices of z and Q. 

In the independent component analysis (ICA) the aim is to find an estimate for an un- 
mixing matrix F such that Fx has independent components. Again, if F is a mixing matrix 
then so is CT for all possible matrices C G C. Thus F = in the model ([1]) is just one 
possible unmixing matrix and the ICA problem reduces to estimating an unmixing matrix 
only up to the order, signs and scales of the rows of Q~^. In the signal processing and 
computer science communities ICA procedures are usually seen as algorithms rather than 
estimates with their statistical properties. The most popular algorithms, if formulated with 
random variables, then often proceed as follows. 



1. In the model ([T]), one can assume without loss of generality that Cov{z) 
after whitening, we get the random vector 

y = Cov{x)-^/\x - E{x)) = V{z - E{z)) 



I p. Then, 



with some orthogonal matrix V . 

2. Using y, find a orthogonal matrix U = [ui, Up)'^ with the rows Ui, i = 1, ■■■,p, chosen 
to maximize (or minimize) a criterion function, say YJi=i \-^['^{''^Iy)]\- The optimization 
may be conducted one by one or simultaneously. The function G (measure of non- 
gaussianity, negentropy, kurtosis measure, log-likelihood function, etc.) is chosen so 
that the solution is U = up to possible sign changes and permutations of the rows. 

3. The final ICA solution is then F = UCov{x)-^/^. 

The fastICA algorithms described in Hyvarinen and Ojal ( 1997 ) for example works in this 
way. The rows of U are then found either one after another (deflation-based fastICA) or si- 
multaneously (symmetric fastICA). The sample versions are naturally obtained by replacing 
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the expectations by corresponding sample averages. For detai led descriptions of the fas - 
tlCA procedures and several other estimates and algorithms, se elCichocki and Amaril (120061 ) 

For other type of estimates, see Chen and Bickel fcoosl ) and 



and 


Hvvarinen et al. 


( 


2001) 


Chen and Bickel ( 


2006 


)■ 



Due to the vast amount of different ICA estimates and algorithms, asymptotic as well 
as finite sample criteria are needed for their comparisons. While results on asymptotic 
statistical properties (convergence, asymptotic normality, etc.) are usually missing in the 
literature, several finite-sample performance indices have been proposed for the comparisons 
in simulation studies. Let T be an unmixing matrix estimate based on the random sample 
X = (xi,...,x„)^ from the distribution in model ([T]). First, one can compare the "true" 
sources Zi (which are of course known in the simulations) and the estimated sources ij = Txi, 
i = 1, ...,n. Second, one can measure the closeness of the "true" unmixing matrix (used 
in the simulations) and the estimated unmixing matrix F. In both cases the problem is that 
F is typically not an estimate for Q~^. However, for any reasonable estimate F, either (i) 
there exists a C G C such that F is a consistent estimate of QC~^, or (ii) there exists a 
(possibly unknown or unspecified) matrix C E C such that CT is a consistent estimate of 
Q^^. Therefore, for a good estimate, the gain matrix G = TQ tends to be close to some 
matrix C E C. In this paper we discuss performance indices that are based on the use of 
G = TQ. A new index is proposed that finds the shortest distance (using Frobenius norm) 
between the identity matrix and the set of matrices equivalent to the gain matrix TQ. 

We organize the paper as follows. First, in Section O we give a formal (mathemat- 
ical) definition of the IC functional which is independent of the model formulation. We 
consider two families of IC functionals, (i) the family based on two scatter matrices with 
independence property, and (ii) the family of deflation-based fastICA functionals. We review 
limiting behavior of the corresponding estimates and we prove the asymptotic normality of 
the deflation-based fastICA under certain general assumptions. Previous attempts to prove 
the asymptotic normality of the deflation-based fastICA that have been presented in the 
literature contain severe faults. In Section [3] we consider the use of the gain matrix in the 
comparison of different IC estimates. Several approaches are discussed in detail. In Section 
H] a new index for the comparison is introduced. The computation of the new index is shown 
to be straightforward and easy. We also consider the limiting behavior of the index as the 
sample size approaches infinity; the asymptotic properties of the index are in a natural way 
determined by the asymptotic properties of the estimate F. The finite sample vs. asymptotic 
behavior of the index for several different ICA estimates with known asymptotics is illustrated 
in a small simulation study. Most proofs of the theorems are placed in the Appendix. 



2 IC functionals 

In this section we give a formal (mathematical) definition of an independent component (IC) 
functional. The definition is independent of the model formulation, that is, of the choice of Q 
and z. As an example we consider the family of IC functionals based on two scatter matrices 
with independence property, and the family of deflation-based fastICA functionals. 
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2.1 Formal definition 



Let Q be the set of all full-rank p x p matrices. Then naturally all unmixing matrices T ^ Q. 
Let P denote a permutation matrix (obtained from Ip by permuting its rows or columns), J 
a sign-change matrix (a diagonal matrix with diagonal elements ±1), D a rescaling matrix 
(a diagonal matrix with positive diagonal elements). For the definition of an IC functional 
we need the subset 

C = {C eg -.0 = PJD for some P, J, and D} . 

If C G C, each row and each column of C has exactly one nonzero element. Then C gives 
a group of afiine transformations (with respect to matrix multiplication) as it satisfies (i) if 
Ci,C2 e C then C1C2 G C, (ii) Ip G C, (iii) if C G C then there exists C7"^ G C such that 
CC^^ = C^^C = Ip. The group is not commutative (Abelian) as C1C2 = C2C1 may not be 
true. 

We say that two matrices Fi and F2 in Q are equivalent if Fi = CT2 for some C G C We 
then write Fi ~ F2 and give the following definition. 

Definition 2.1. Let denote the cdf of x The functional F(FxO e Q is an IC functional 
in the IC model (QP if (i) F(Fa.)fi ~ Ip and if (ii) it is affine equivariant in the sense that 
^{Fax) = ^{Fx)A~^ for all nonsingular p x p matrices A. 

Remark 2.1. The first condition says that T{Fx) and are equivalent matrices and that 
there exists C = C{Fx,Q) G C such that the "adjusted" IC functional Cr{Fx) = Q~^. Note 
that, if the second condition (ii) is replaced by a weaker condition (iii) T{Fax) ~ ^{Fx)A~^ 
for all nonsingular matrices A, then one can often find a new functional 

F*(F,)=C(F,)F(F,) 

with C{Fx) G C satisfying condition (ii). If the fourth moments exist, functional C = C{F^) 
may be defined by requiring, for example, that Var{{T*x)i) = 1, i = l,...,p, /3i((F*x)j) > 0, 
i = l,...,p, and (32{{T*x)i) > ... > f32{{T*x)p) where /3i and (32 are classical moment-based 
skewness and kurtosis measures, respectively. Then T*{Fax) = ^*{Fx)A^^ for all nonsingular 
p X p matrices A. Other criteria for constructing C{Fx) can be easily found. 

Remark 2.2. In practice, the IC functional is often seen rather as a set of vectors {71, 7p} 
than as a matrix F = (71, 7^)"^. If Pi = \ |7j| |~^7j7j' is the projection matrix to the subspace 
spanned by 7^, i = l,...,p, then the functional can also be defined as a set of projection 
matrices {Pi, Pp} . 

Note that, for an IC functional F in model ([I]) r{Fx)flC ~ Ip for all C G C. Therefore the 
definition of the IC functional does not depend on the specific formulation of the model (the 
choices of Q and z). Also, r{Fx)x = r{Fz)z where T{Fz) is in C. If we choose z* = r{Fz)z 
and fl* = Qr{Fz)~^, then r{Fx)Q* = Ip. This formulation of the model is then most natural 
(canonical) for functional r{Fx). 

2.2 Functionals based on two scatter matrices 

A scatter functional S{Frc) is a p x p -matrix-valued functional which is positive definite and 
affine equivariant in the sense that 

S{FAx+b) = ASiF,)A^ 
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for all nonsingular p x p matrices A and for all p-vectors b. A scatter functional S is said to 
possess the independence property if S{Fx) is a diagonal matrix for all x with independent 
components. Naturally, the usual covariance matrix 

S,{F,) = E{{x- E{x)){x - E{x)f) 

is a scatter matrix with the independence property. Another scatter matrix with the inde- 
pendence property is the matrix based on fourth moments, namely, 

S2{F,) = {{x - E{x)){x - E{x)fSi{F,)-\x - E{x)){x - E{x)f) . 

p + 2 ^ 

For any scatter matrix S{Frc), its symmetrized version 



where Xi ari d x^ ar e independent copies of x, has the independence propertv. lOia et all (120061): 



Tyler et al.l (120091). For symmetrized M-estimators and S-estimators, see iRoelandt et al 



(120091 ): LSirkia et all ( 120071 ). 

The IC functional T{Fx) based on the scatter matrix functionals Si{Fx) and 5*2 (-F^) is 
defined as a solution of the equations 



TSiT^ = Ip and F^sF^ = A 
where A = A{F;j.) is a diagonal matrix with diagonal elements Ai > 



> A„ > 0. One 



of the first solutions for the ICA problem, the FOBI functional, ICardosd (119891 ). is ob- 
tained if the scatter functionals Si and 5*2 are the scatter matrices based on the second and 



OUila et al. 


( 


2008b): 


Ilmonen 



( 2OI2I ) (complex data) . 

Assume now (wlog) that Q = Ip and that Si^F^) = Ip and 5*2 (-Fj,) = A where Ai > 
... > Xp > 0. Assume also that both 5*1 and S2 have the independence property. Write 
Si = Si{Fn) and 5*2 = S oiFn) ('values o f the fu nctionals at the empirical cdf We then 
have the following result, lllmonen et al.l ( l2010al ). 

Theorem 2.1. Assume that 

V^iSi - Ip) = Opil) and v^(^2-A) = Op(l), 
with Al > ... > Ap > 0, and the estimates f and A are given by 

tSit^ = Ip and f^2F = A. 
Then, there exists a sequence of estimators such that T Ip, 



^y/ndiag{Si - Ip) + Op(l) 



y/ndiagit — Ip 
\/n ofF(f — Ip 



where H is a p x p matrix with elements 



and 



^/^ HQ ({S2-A)-{Si- Ip)A) + Op(l) 



Hi. 



0, ifi= j, and Hij = (Ai - Xj) \ if i j. 
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Above off(r) = r — diag(r), where diag(r) is a diagonal matrix with the same d iagona l 



Ilmonen et aD feniOah 



elements as F, and denotes the Hadamard (entrywise) product, 
considered the limiting distribution of the FOBI estimate (with limiting covariance matrix) 
in more detail. It is interesting to note that the asymptotic behavior of the diagonal elements 
of F does not depend on 5*2 at alL 



App roaches s uch a s JADE, ICardoso and Souloumiad (jl993l ). or the matrix-pencil ap- 



proach, lYeredorl ( 120091 ) . (approximately) jointly diagonalize two or more data matrices (not 
necessarily scatter matrices). The asymptotic properties of these estimates are typically 
however still unknown. 



2.3 Deflation-based FastICA functionals 



Our second example on families of IC functionals is given by the deflation-based fastICA 
algorithm. FastICA is one of the most popular and widespread ICA algorithms. Detailed 
exar nination of f astIC A fun c tional s are provided for example in iHyvarinen and Ojal ( 119971 ) 
and lOUilal ( 120101 ). In lOUilal ( 120101 ). the asymptotic covariance structure of the row vectors 
of deflation-based fastICA estimate F is given in closed form. No rigorous proof of the 
asymptotic r iormality of th e fastI CA estimate has been presented in the literature so far; see 
for example IShimizu et al.l (120061 ). In this section we discuss the conditions needed for the 
asymptotic normality of the deflation-based fastICA estimate. 

Assume that x = Qz as in model ([1]) with finite first and second moments E{x) = fi 
and Cov{x) = S. In this approach the first row of F is obtained when a criterion function 



|E(G(7 



T, 



X 



/i)))| is maximized under the constraint 7 S7 = 1. If we wish to find more 



than one source then, after finding 71, 
under the constraint 



, 7fc_i, the kth source maximizes \E{G{'y 



[X 



Ik^lk = 1 and 7jS7fc = 0, j = 1, k - 1. 
If G satisfies the condition 



|^(G(«iZi + «2^2))| < max(|E(G(^i))|, |^(G(^2))|) 



for all independent zi and Z2 such that E{zi) = E{z2) = and E{zf) = E{z2) = 1 and for 
all «! and 0:2 such that af + a^ = 1, then the independent components are found using the 
above strategy. It is easy to check that the condition is true for the classical kurtosis measure 
G{z) = z^ — 3, for example ( Bugrienl . 20051 ). 

Write T{F) for the mean vector (functional) and S{F) for the covariance matrix (func- 
tional). The kth fastICA functional 7a;(-F) then optimizes the Lagrangian function 



|E[G(7r(x - r(F)))]| - ^hlSiFh, - 1) - 5^ A,.7j^(F)7., 

i=i 

where Ai^, A^fc are the Lagrangian multipliers. U g = G' then one can easily show that 
(under general assumptions) the functional F = (71, ...,7^)^ satisfies the p estimating equa- 
tions 



k-l 



E[9{ll{x - T{F)){x - T(F)))] = S{F)Y,1,^'; E[g{il{x - T{F)){x - T{F)))l 
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k = If z = Tx has independent components then T solves the above estimating 

equations. Note, however, that the estimating equations do not fix the order of sources 
7i, ...,7p anymore. 

Remark 2.3. As mentioned before, the ICA procedures are often seen as algorithms rather 
than estimates with statistical properties. The popular choices of g for practical calculations 
are powS: g{z) = z^, tanh: g{z) = tanh{z), and gauss: g{z) = ze~^ for example. If 
E{x) = then the fastICA algorithm for 7^ uses the iteration steps 

1. 7. ^ ^-^E[g{^lx)x] - E\g'{^lx)]^k 
3. 7fc ^ 7fc/ \/ll^lk 

The sample version is naturally obtained if the expected values are replaced by the averages in 
the above formula. It is important to note that it is not known in which order the components 
are found in the above algorithm. The order depends strongly on the initial value in the 
iteration. 

We next consider the hmiting behavior of the sample statistic T based on a random 
sample We assume that E{xi) = and Cov{xi) = Ip and that the true value is 

r = Ip = (ci, Cfc)^. Write x and S for the sample mean vector and sample covariance 
matrix, respectively. If the fourth moments exist then ^/nvec{x, S — Ip) have a limiting 
multivariate normal distribution (CLT). Write f = (71, 7p)-^ for the fastICA estimate of 
r. Write also 

/ifc = E[g{elxi)], Afc = E[g{elxi)elxi] 

and 

Tk = E[g'{elxi)elxi], 6k = E[g'{elxi)], 

k = 1, ...,p. We need later the assumption that 6k, k = 1, ...,p — 1. (If g{z) = z^, for 
example, this assumption rules out the normal distribution.) For sample statistics 

^ n 1 " 

Tk = -y^(g{elxi) - iJ,k)xi and fk = -y^Q{^k{xi-x)){xi-x) 

we need the assumption that, using the Taylor expansion, 

A/ra(Tfc - AfcCfc) = \fnTk - Tkekcl^/nx + Ak^/n{^k - Cfc) + op(l) (2) 

where = E[g' {el.Xi)xixJ], k = 1, ...,p. Again, if g{z) = z^ and the sixth moments exist, 
then ([2]) is true and y/n{Tk — Xk^k) has a limiting multinormal distribution. The estimating 
equations for the fastICA solution T = (71, ...,7p)^ are then given by 

Tk = ^[7i7f + ••• + lklI]Tk, k = l, ...,p. (3) 
If ([2]) is true and Uk = Yl'j=i ^j^J then 

k 

{Ip - Uk)^/n{fk - XkCk) = XklV^iS - Ip)ek + ^ ejel^/n{'^j - ej) 

i=i 

+ Vnilk - ek)] + Op{l) 

and we get the following result. 
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Theorem 2.2. Let xi, x„ be a random sample from the model (Qp with Q = Ip, E{xi) = 0, 
and Cov{xi) = Ip. Let T = (71, ...,7^)-^ be the solution for estimating equations in and 
the estimate satisfies T — t-p Ip. Then, under the general assumptions given above, 



Vn{%k - 1) 



eJy/nTk - XkVnSki + op(l), for I > k 



aA7, 



kl 



y/n'Jik - y/nSki + op{l) for I < k 



Remark 2.4. Theorem \2.2\ implies that, if y/n{Tk — Xk^k), k = l,...,p, and ^Jnvec{S — 
Ip) have a joint limiting multivariate normal distribution then also the limiting distribution 
of y/nvec{T — Ip) is multivariate normal. Interestingly enough, the limiting distribution of 
the estimated sources 7i,...,7p depends on the order in which they are found. The limiting 
behavior of the diagonal elements oft does not depend on the choice of the function g{z). 
The initial value for T in the fastICA algorith m fixes the asymptotic order of the sources. 
For more details, see lNordhausen et al. 



3 On Performance Indices 

Let X = {xi, ...,Xn)'^ be a random sample from the model ([T]) with some choice of Q and 
z. An estimate of the population quantity T{Fx) is obtained if the functional is applied 
to the sample cdf We then write F or r(F„) or T{X). The gain matrix G = TQ is 
then generally used to compare the performances of different estimates. For any reasonable 
estimate, G C for some C = C{Fz) G C. How can one then compare matrices G 
converging to a different population value C that depend on functional F and the specific 
choice of ^2 and z in the model ([I])? 

3.1 Canonical parametrization 

For a comparison of different estimates F choose, separately for each IC functional F, the 
corresponding canonical parametrization 

x = n*z* = {nv{F,)'^){v{F,)z). 

Note that VLT{Fz)~^ does not depend on the model formulation (the original choices of VL and 
z) at all and that T{Fx)VtT{Fz)~^ = Ip. A correctly adjusted gain matrix 

G = fF(F,)-i = mF(F,)-i 

can then be used for a fair comparison of different estimates F as in the model ([1]) G — t-p Jp 
for all F . A natural performance index can then be defined as D'^{G) where 

D^{G) = \ \G-Ip\\^. 

If ^/nvec{G — Ip) — t-^ Np2[0, Sr) (as is true with the estimates in Sections 12.21 and I2.3P then 
we get the following result. 
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Theorem 3.1. Assume that, for the correctly adjusted gain matrix 

G = mT{F,)-\ 

it holds that ^Jn vcc(G' — Jp) -^^ -?Vp2(0,Sr). Then the limiting distribution of n ■ D'^{G) is 
that of^^^^SiXi where xi,----,Xk ^'^^ independent chi squared variables with one degree of 
freedom, and 6i, ...,Sk are the k nonzero eigenvalues (including all algebraic multiplicities) of 
Sr. 

3.2 Adjusted functional 

It is often hoped that the independent components in r{Fx)x are standardized in a similar 
way and/or given in a certain order. To formahze this step, we then need the following 
auxiliary functional to standardize (rescale and reorder) the components. 

Definition 3.1. Let denote the cdf of x The functional C{Fx) & C is a standardizing 
functional if it satisfies 

C{Fa.)=C{F,)A-\ for all a eC 

Remark 3.1. If the fourth moments exist, functional C = C{Fx) may be defined by requiring, 
for example, that Var{{Cx)i) = 1, i = l,...,p, /3i((Ca;)j) > 0, i = l,...,p, and /32((C'x)i) > 
... > j32{{Cx)p) where j3i and ^2 o-fe, as before, classical moment-based skewness and kurtosis 

measures, respectively. Of course, the functional is not well defined if the components have 
the same distribution. Note, however, that the corresponding sample statistic is uniquely 
defined (with probability one). Other standardizing functionals can be easily found. 

Definition 3.2. Let F^ denote the cdf of x, and ^{F^) an IC functional. Then the adjusted 
IC functional r*{Fx) based on C{Fx) is 

r*(F,)^C{Fr^F^),)r{F,). 

Note that adjusted IC functionals are directly comparable as they all estimate the same 
population quantity. The estimate is 

r* = C{XT{Xf)T{X) 

and the gain matrix reduces to 

G = fT*(F^)-i = G{xr{xf)r{x)nG{F,)-\ 

The standardizing functional C{F) is thus needed to fix the scales, the signs, and the 
order of the estimated independent components. The rescaling part D{F) of the functional 
G{F) is a diagonal matrix with positive diagonal elements, and it is often determined by a 
scatter functional S{F) so that 

D{F,) ^ {di^g{S{F,)))-'/\ 

The rescaled IC functional is then r*{Fx) — D{Fr(F^)x)^{Fj;) with the sample version 

/ . . . \ -1/2 
r* = DT where D = fdiag(r5r^) j 

We next consider the effect of the rescahng functional. 
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Theorem 3.2. Assume (w.l.o.g.) that Q = Ip and S{Fz) = Ip. Assume that ^Jn{S — Ip) = 
Op{l) and ^/n(^—D^^) = Op(l) for some diagonal matrix D with positive diagonal elements. 

Write f* = bt where D = (^diag{t St'^)^ ^ ^ Then 

^{r - Ip) = -^diag{S - Ip) + V^offiDT - Ip) + op{l). 

The gain matrix for the comparisons is thus 

G = t*QD{F,)'^ 

with the limiting distribution given by Theorem 13.21 As, for aU the estimates F*, the limiting 
behavior of the diagonal elements of G is similar, one can use ||off(G)|p in the comparisons. 
If y/nvec{G — Ip) -^d Np2{0, Er*) then we get the following result. 

Theorem 3.3. Assume that, for the gain matrix of the adjusted estimate 
G = rQD{F,)-\ 

it holds that y/n vec(G' — Ip) —>d Np2(0,T,T*) ■ Then the limiting distribution o/n||oj(f(G)|p 
is that ofY^\^^ 6iXi where Xi, xl (^f^ independent chi squared variables with one degree of 
freedom, and 6i,...,6k are the k nonzero eigenvalues (including all algebraic multiplicities) of 

{Ip2 — Dp^p)T,r*{Ip2 — Dp^p), 

with Dp^p = ELi(eief) (g) (ejcf). 



3.3 Solution as a set {71, ...,%} 

Note that the first two approaches above do not depend on how we fix Q and z in the 
model ([1]). In these two approaches it is assumed, however, that TQ is a root-n consistent 
estimate of some C G C. Among other things, this means that the order, signs, and scales 
of the independent component functional are fixed in some way. In practice, the solution 
in the ICA problem is often seen rather as a set {71, ■■■,%} than a matrix T = (71, ...,7^)-^. 
The vectors jj, i = l,...,p, span corresponding univariate linear subspaces; thus the order, 
signs and lengths of 7^ are not interesting. Finally, in the comparisons, one is usually only 
interested in the set of gain vectors {^1, ...,gp} where gi = n^7i, i = I, ■■■,p, not in the gain 
matrix G = {gi, ■■■,gp)^ itself. 

A common way to standardize the lengths of the rows of the gain matrix is to transform 
G —7- DG where Z) is a diagonal matrix with diagonal elements 

Ai = jmax , i = l,...,p. (4) 

We then have the following result. 

Theorem 3.4. Assume that Q = Ip and that y/n(G — D^^) = Op(l) where D is a diagonal 
matrix with positive diagonal elements. Let ID be a diagonal matrix given Q). Then 

V^ibG - Ip) = ^/^off{DG - Ip) + op(l). 
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The inference-to-signal (ISR) ratio and inter-channel inference (ICI). lDouglasl ( 120071 ). uses 
this row-wise consideration and is given by 



p 

i=l 



EP /A2 
.7 = 1 



P 



1 



This index is invariant under permutations and sign changes of the rows (and columns) of G, 
and it is also naturally invariant under heterogeneous rescaling of the rows. It depends on the 
choice of fl but no adjustment of T is needed. Theorem 13.41 can be used to find asymptotical 
properties of this criterion. 



One of the most popular performance indices, the Amari index, lAmari et al.l ( 119961 ). is 
defined as 



1 

P 



p 



,i=i maxj- \Gij\ 



+ 



sr., \G, 



^ maxj \G 



The index is invariant under permutations and sign changes of the rows and columns of G. 
However, heterogeneous rescaling of the rows (or columns) on G changes its value. Therefore, 
the rows of T should be rescaled in a suitable way and use G = DTQ. (A general practice 
in the signal processing community is that Q and z are chosen so that Cov{z) = Ip and that 
the sample covariance matrix of txi, ...,rxn is /„ as well.) However, as the index is based 



on the Li norm, its limiting distr ibution is quite complicated. The intersymbol interference 



(ISI), iMoreau and Macchil (119941 ). is similar to the Amari index in that it is also based on 



similar row- wise and column- wise considerations and that similar adjusting is needed for f . 



Chen and Bickell (l2006l ) for example use an invariant criterion by computing the norm 



\\TQ — Ip\\, after suitable rescaling, sign changing, and permutation of the rows of F and 
columns of Q. 



4 A new index for the comparison 
4.1 Minimum distance index 

Let A he a p X p matrix. The shortest squared distance between the set {CA : C E C} oi 
equivalent matrices (to A) and Ip is given by 

D\A) = inf \\CA-Lf 

where || ■ || is the matrix (Frobenius) norm. 

Remark 4.1. Note that D^{A) = D\CA) for allC eC. 

Theorem 4.1. Let A be any p x p matrix having at least one nonzero element in each row. 
The shortest squared distance D'^{A) fulfils the following four conditions: 

1. 1 > D^{A) > 0, 

2. D'^{A) = %f and only if A Ip, 
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3. D [A) = 1 if and only if A Ipfl for some p-vector a, and 

4- the function c — )■ D'^{Ip + c off (A)) is increasing in c E [0, 1] for all matrices A such 
12. 



that Af, <l, i^ J. 



Let X = [xi, be a random sample from a distribution where x obeys the IC 

model ([I]) with unknown mixing matrix Q. Let T{F) be an IC functional. Then clearly 
D'^{T{Fx)fl) = 0. If Fn is the empirical cumulative distribution function based on X then 

f = f(x) = r(F„) 

is the unmixing matrix estimate based on the functional r{Fx). 

The shortest distance between the identity matrix and the set of matrices {CTQ : C G C} 
equivalent to the gain matrix G = TQ is as given in the following definition. 



Definition 4.1. The minimum distance index for T is 

1 



b = D{m) = inf II cm - ip\ 



It follows directly from Theorem 14.11 that 1 > -D > 0, and Z) = if and only if f ~ ^2^^. 
The worst case with £) = 1 is obtained if all the row vectors of TQ point to the same 
direction. Thus the value of the minimum distance index is easy to interpret. Note that 
D{tn) = D{Cm) for all C eC. Also, 

D{r{XA'^)AQ) = D{r{X)Q). 

Note also the nice and natural local behavior described in Theorem 14.11 condition 4. 



Theis et al.l (120041 ) proposed an index called the generalized crosstalking error which is 
defined as the shortest distance \ \Q — r^^C|| between the mixing matrix Q and its adjusted 
estimate r~^C, C EC. The generalized crosstalking error is then defined as 

Eiyt.V) = inf \\Q-t-^C\\ 

where || ■ || denotes a matrix norm. Clearly, E{Q, f ) = E{Q, Ct) for all C G C but 
E^AQjTlXA^)) = E[Q,r{X)) is not necessarily true. If the Frobenius norm is used, the 
new index may be seen as a standardized version of the generalized crosstalking error as 



b = inf ||C-if (Q-T-^C 
c&c \ 



Note that, unlike the minimum distance index, the values of the Amari index for TVl and 
DTVt (with a diagonal matrix D) may differ. The Amari index thus silently assumes that the 
rows of F are prestandardized in a specific way. The minim um distance index is compared 
to other indices in more detail in Nordhausen et al. fl2011bl l 



4.2 Computation 

At first glance the index t) = D(TQ) seems difficult to compute in practice as the mini- 
mization is over all choices C G C. However, the minimization can be done by two easy 
steps. 
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p 


3 


5 


10 


25 


50 


100 


Time 


0.19 


0.29 


0.64 


3.13 


12.62 


57.54 



Table 1: Computation time in seconds for 1000 indices for different dimensions p. 



Lemma 4.1. Let V denote the set of all p ^ p permutation matrices. Let G = TQ, and let 

Gij = G^j/^^=iG'ik> hi = Now the minimum distance index can be written as 



D = D(G) = -^(p-max(MPG))) 



1/2 



The maximization problem 



max (^tr(PG') 

over aU permutation matrices P can be expressed as a linear programming problem where the 
constraints are that all rows and all columns must add up to 1. In a personal communication 
Ravi Varadhan pointed out that it can be seen also as a linear sum assignment problem 
(LSAP). That LSAP, which is a special case of linear programmi ng, is equivalent to fi nding 
a minimizing permutation matrix as is stated for example in (jPantzig and Thapal . Il997l 



Chapter 8.5). The cost matrix A of the LSAP in this case is given by Ajj = Y2^=ii-^jk~Gik)'^, 
i,j, = 1, ...,p, and many solvers exist for th e computation. We used the Hungarian method 



(see e.g. iPapadimitriou and Steiglitzl ( 1l982l )) to find the maximizer P, and in turn compute 
i) itself. 

The ease of computations is demonstrated in Table [1] where we give the computation 
time of thousand indices for randomly generated p x p matrices in different dimensions. The 
computations were performed on an Intel Core 2 Duo T9600, 2.80 GHz, 4GB Ram using 
MATLAB 7.10.0 on Windows 7. 



An R-implementation of the index is available in the R-package JADE, iNordhausen et al. 



fl2011cf ). 



4.3 Asymptotic behavior 

Let the model be written as x = Qz, where now z is standardized such that r(F^) = Ip. 
Then T{Fri.) = and without any loss of generality we can assume that T{Fx) = Q = Ip. 
We then have the following. 

Theorem 4.2. Assume that the model is fixed such that T{Fx) = Q = Ip and that ^Jn vec(r — 
Ip) iVp2(0,S). Then 

nD2 = -_-||off(r)||2 + op(l) 

and the limiting distribution of nD"^ is that of {p — l)~^^jLi^jX? where Xi,----,Xk ^'^^ 
dependent chi squared variables with one degree of freedom, and 61,..., 6k are the k nonzero 
eigenvalues (including all algebraic multiplicities) of 

ASGOViV^ vec{oS(T))) = (1^2 - Z^p,p)S(/p2 - Dp^p), 

with Dp^p = ELil^i^f) ® (cicf). 
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Note that, for the theorem, we fix the model in a specific way (canonical formulation, 
T{Fz) = Ip) to find the limiting distribution. Then, for all choices of of Q and z, 



nD' = -||off(fr(F,)-i)|p + op(l) 

p — 1 

where f is as in Theorem 14. 2[ Note also that the mean of the limiting distribution of 
n{p — 1)(-D^) is equal to tr (^ASCOV{y/n vec(off(f )))j , which is a regular global measure 

of the asymptotic accuracy of the estimate F in a model where it is estimating the identity 
matrix. Furthermore, to calculate this limiting value, it is enough to know the asymptotic 
variances of elements of F only. Recall that the variances of diagonal elements are not used. 

It is also important to note that similar asymptotical results for the Amari index cannot 
be found since (i) it is not invariant in the sense that the values for TQ and DTQ may differ, 
and (ii) it is based on the use of Li norms. 

Remark 4.2. The new performance index presented in this paper is based on 

inf ||Cm-/J|. 

This formulation can be seen as a method that fixes the mixing matrix Q and transforms T 
to optimally adjusted CT. The index is not invariant under the transformations Q — t- QC^^ . 
One could alternatively base the index on 

inf limC-JJI. 

This alternative formulation can be seen as a method that fixes the unmixing matrix estimate 
r and transforms Q to optimally adjusted (random) QC~^ . Asymptotical behavior of this 
index is similar to that of the minimum distance index D but it is not invariant under trans- 
formations F — )■ Cr. It seems more natural to us to fix Q and z and allow transformations 
to T. 

Remark 4.3. Still another interesting possibility is to define the criterion index as 

inf WCiVnCo'^ - IJ. 

This index is naturally invariant under both T — t- CiT and Q — QC2^ and is fully model 
independent. Unfortunately, it does not seem to work in practice. In the bivariate case, for 
example, it is easy to see that, for all choices of gu ^ 0, g22 and g2i, the gain matrices 

V 921 922 

all give the optimal index value zero. 
4.4 A simulation study 

The finite-sample behavior of the new index D is now considered for three estimates, namely, 
(i) the FOBI estimate, and (ii) the defiation based fastICA with g{z) = z^ (pow3), and 
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the deflation based fastICA with q(z) = t a nh(z) (tanh) . The asymptotic nor mality of 
the FOBI estimate is proven in lllmonen et al. fl2ninah . See lllmonen et al.l toiOih also for 
the limiting covariance matrix of the FOBI estim ate. The asy mptotic covariance matrix of 
the deflation based fastICA estimate is given in lOUilal (|2010[ ). Asymptotic normality was 
proven in this paper. If the parametric marginal distibutions were kno wn, it is po s sible to 
find the maximum likelihood estimate (MLE) of the unmixing matrix; lOllila et al.l ( l2008al ) 
found its limiting covariance matrix. As a general reference value we can then compute the 
Cramer-Rao type lower bound for tr (^ASCOV{y/n vec(off (f )))y 

The simulation setup consists of three {p = 3) independent components with Laplace, 
logistic and beta{3, 3) distributions. They were all standardized to have expected value and 
variance 1. In the simulations, the mixing matrix Q was the identity matrix I^. The sample 
sizes were n = 5000, 10000,^ 25000, 50000, 75000, 100000 with 10000 repetitions, and for 
each repetition the value of D was computed for all the estimates. As shown for the fastICA 
estimates in Section [2l3l the limiting distribution of the estimated sources 71, ...,7p depends 
on the order in which the algorithm finds them. In practice, the order can be controlled with 
the initial value of the algorithm. Using the identity matrix as initial value, for example, 
finds the sources in the order they are given above, and a permuted identity matrix as a 
starting value finds the sources in a similarly permuted order. To illustrate this property in 
our simulations, we extracted the sources in two different orders, (a) 6eta(3,3), logistic and 
Laplace, and (b) Laplace, logistic and beta{3, 3). The estimates are then denoted by pow3(a), 
pow3(b), tanh(a), and tanh(b), respectively. 

Using the results in Ilmonen et al.l ( 2010a ). one can calculate the limiting variances of the 
components of ^/n{^FOBI — I)- As a matrix form, the variances then are 



V, 



FOBI 




26.71 5.07 
0.80 8.78 
8.51 0.33 



where (yFOBi)ij is the limiting variance of y/nitpoBi — hj = 1,2,3. Then 

tr (^ASCOV{y/^ Yec{oS{tpoBi)))) = 24.38 + 4.43 + 26.71 

+ 8.51 + 5.07 + 8.78 
= 77.88. 



Similarly, using results in lOUilal (120101 ) 



0.33 5.45 5.45 
4.45 0.80 16.43 
4.45 15.43 1.25 



and K 



pow3(b) 



and then 

and 

Finally, 



tr i^ASCOViV^ vec(off (rp,^3(a)))) 
tr (ASCOViV^ vec(off (fpo^3(6)))) 



tanh{a) 



0.33 7.75 7.75 
6.75 0.80 11.37 
6.75 10.37 1.25 



tanh(b) 



1.25 7.00 7.00 
6.00 0.80 16.43 
6.00 15.43 0.33 

51.66, 

57.86. 



1.25 3.01 3.01 
2.01 0.80 11.37 
2.01 10.37 0.33 
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FOBI 



<■ 

1 I 



5000 25000 50000 75000 100000 
n 

Figure 1: Boxplots for n{p — 1)D^ based on the FOBI estimate for different sample sizes n 
and 10000 repetitions on log scale. The three independent components have Laplace, logistic 
and beta{3, 3) distributions. The horizontal line gives the limiting mean value. 



which gives 

tr (^ASCOV{V^Yec{oS{ttanhia))))) = 50.74 

and 

tr (^ASCOViy/^ vec(off (ft,„^(6))))) = 31.78. 

There are quite big differences in the asymptotic behavior of the fastICA estimates only 
depending on the order in which the sources are found. Note also that the variances of 
the diagonal elements of F are equal for all the estimates studied here. They are simply 
the limiting variances of the sample variances of the standardized independent components 
divided by 4 as, in all the cases, the regular covariance matrix is used to whiten the data. 
The variances of the diagonal elements of F are then not used in the comparison. 

Boxplots in Figures [HE] and [3] illustrate the finite-sample behavior of the index for different 
estimates. The horizontal lines give the limiting mean values on a log scale. The FOBI 
estimate is known to converge in distribution to a multivariate normal distribution, but 
the convergence is very slow. The distributional convergence of n{p — 1)D'^ is then also 
slow as is seen from Figure [H What is interesting, is that the speed of convergence of the 
distribution (not only the covariance structure) of the fastICA estimate seems to depend on 
the order of the found sources, see Figure [2] and Figure [31 The distributional convergence 
of n{p — 1)Z)^ for tanh(b) seems to be faster than that for tanh(a), see Figure [3] The same 
is true for pow3(b) and pow3(a) as well, see Figure [2l The estimated means of n{p — 1)Z)^ 
for different estimates F are compared in Figure [4] again with asymptotic horizontal lines. 



1000.0 



,100.0 



> 

CO 



;r 10.0 

I 



1.0 



0.1 



16 



fastICA using pow3 (a) 
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fastICA using pow3 (b) 
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Figure 2: Boxplots for n{p — 1)D'^ based on the fastICA estimates pow3(a) and pow3(b) for 
different sample sizes n and 10000 repetitions on log scale. The three independent components 
have Laplace, logistic and beta{3, 3) distributions. The horizontal line gives the limiting mean 
value. 
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fastICA using tanh (a) 
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fastICA using tanhi (b) 



1000.0 



100.0 



10.0 

c 



I 

Q. 



1.0 



0.1 




5000 



25000 



50000 



75000 



100000 



Figure 3: Boxplots for n{p — 1)D'^ based on the fastICA estimates tanh(a) and tanh(b) for 
different sample sizes n and 10000 repetitions on log scale. The three independent components 
have Laplace, logistic and beta{3, 3) distributions. The horizontal line gives the limiting mean 
value. 
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100 H MLE A pow3(a) A pow3 (b) 

-o- FOBI -O - tanh(a) -O - tanh (b) 
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r 
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5000 



Figure 4: The estimated mean values oin{p—l)D^ for the estimates FOBI, pow3(a), pow3(b), 
tanh(a) and tanh(b). The dashed horizontal lines give the corresponding limiting mean 
values. The solid horizontal line is the limiting mean for the MLE (with known marginal 
distributions) . 
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The performance of the FOBI estimate is clearly worst. The MLE with the assumption that 
the margi nal distributions are known provides the Cramer-Rao lower bound for the limiting 
OUila et al.l (l2008al ). The order in which the sources are found seems to have a 



mean, see 



huge effect on the performance of the fastICA estimate. If the sources are found in the order 
6ei:a(3,3), logistic, and Laplace, there is no big difference between choices pow3: g{z) = 
and tanh: g{z) = tanh{z). If the order is Laplace, logistic, and 6eita(3,3), the estimate tanh 
perform very well while the estimate pow3 gets worse. 



5 Summary 



Independent component analysis (ICA) has gained increasing interest in various fields of 
applications in recent years. As far as we know, this paper provides the first rigorous (math- 
ematical) definition of the IC functional. The functional is defined in a general semiparametric 
IC model and is independent from the parametrization of the model. 

The deflation-based FastICA algorithm is one of the most popular ICA algorithms. Sev- 
eral superficial attempts to find the limiting distribution and limiting cova riance matrix of 



the Fa s tICA mixing rnatrix e stimate can be f ound in the literature (see e.g. iTichavsky et al. 
(I2OO5I ). IShimizu et al.l (|2006[ ). iRevhani et al.l ( 20121) ). The correct limiting covariance matrix 



was found however quite recently in lOUilal (120101 )7 In this paper we provide the assumptions 
needed for the limiting multivariate normality. 

For several popular ICA procedures, the statistical properties are still unknown, and 
their performances are compared using different performance criteria in simulation studies. 
In this paper we discuss several criteria in detail and suggest a new performance index with 
an easy interpretation. The asymptotic behavior of the new index depends in a natural 
way on the eigenvalues of the limiting covariance matrix of an unmixing matrix estimate. 
This is illustrated in a small simulation study with some deflation-based FastICA estimates 
and with FOBI estimate. We did not use other ICA procedures in our study as, for other 
estimates proposed in the literature, the limiting properties are still unknown and/or their 
implementations cannot deal with the sample sizes of our study. Note also that the new 
index can also be computed using the correlation matrix between the estimated and true 
so urces. In that case the ind ex has a nice connection to the mean-squared error as discussed 



m 



Nordhausen et all (l2011b( ). 



The theory pre s ented i n this paper has also important practical implications. For example, 
Nordhausen et al.l ( l2011dl ) introduces a new reloaded deflation-based FastICA algorithm that, 
using a preliminary estimate and the results here, extracts the sources in an optimal order 
to minimize the trace of the limiting covariance matrix. 



Appendix 

The Proof of Theorem 1331 



Assume that ^/n vec{G — Ip) — t-^ A^'p2(0,Sr). Now it follows directly from (iTanl . 11977 , 
Theorem 3.1) that the limiting distribution of n ■ D'^{G) is that of Yl!i=i ^iXl where , xl 
are independent chi squared variables with one degree of freedom, and 6i,...,Sk are the k 
nonzero eigenvalues (including all algebraic multiplicities) of Ep. 
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The Proof of Theorem [3721 

For simplicity, we consider the elements FJ^ and only. The proofs for other elements are 
similar. First note that 

p p 



\i=i j=i 

= (2(fn - D^,')D^,' + (^n - l)/^n') + Op{l). 



But then 



and 



11 . 



Mtl,-1) = v^((f5f')n'^'fn-l 

= (((f^f')n - Dn)D^,' + Dn(f n - D 

+op(l) 
= -^{Su-l)+op{l). 

Finally, 

y^f*2 = ((f^r)n ^'fia) = v^/^n'f 12 + op(l). 
The Proof of Theorem [373] 

Assume that vec(G' — Ip) -^d A'p2(0, Sr*); and let Dp^p = ^^^^(ejef) (eiej). Now 
vec(off((j)) = {Ip2—Dpp)vec{G — I^) and t hus i/n vec(off(G)) — A^p2(0, (Jp2 — Dp^p)Sr* (/p2 — 
Dp^p)). Now it follows from ( jTanl . Il977l Theorem 3.1) that the limiting distribution of 



n||off((j)|p is that of ^jX? where Xi? ••••jXI a-^e independent chi squared variables with 
one degree of freedom, and 5i,...,5k are the k nonzero eigenvalues (including all algebraic 
multiplicities) of 

(Jp2 — Dp^p)Sr*(/p2 — Dp^p). 
The Proof of Theorem [33] 

The proof follows from the fact that y/n^DnGu — 1) = op(l). Then also y/nDnGij = 
^JnDnGij + op(l) for all i ^ j. 

The Proof of Lemma 14.11 

Let A = {ttij) he a. p X p matrix having at least one nonzero element in each row and let 

a2 

A = iaij) = 2 ■ Let £ denote the set of all nonsingular p x p diagonal matrices and let 
L = (lij) e £. Now 

p p p 

\\LA-ipf = j2Yl ^^^4 - 2 + p 

i=l j=l i=l 
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and 

d ^ 
-gj—\\LA — IpW = ^ ^ a^j — tti 

The derivatives are zero with choices 

n ■ ■ 



and the value of \\LA — /p||^ is then 



4 



i=i Z^i=i% 



Let P denote the set of all p x p permutation matrices. Now it follows that if G = f fi, and 
Gij = G]j/J2k=iG'^ky "^'i — •••jP; then the minimum distance index can be written as 



^ = ^(G) = ^(p-max(tr(PG))) 



1/2 



The Proof of Theorem 14.11 

Let A = (aij) he a. p x p matrix having at least one nonzero element in each row. Let 
A = {aij) with (lij = '^^a • Let V denote the set of all p x p permutation matrices. Now 
the shortest squared distance D^{A) = :^{p — maxpgp(tr(PA))). (See the proof of Lemma 
14.11 ) Consider now tr(Pyl), where a^j > 0, for all i,j and X]j=i ^'i — 1- Now clearly the 
maximum value of maxpg7[5tr(PA) is p and it is attained if and only if A is a permutation 
matrix. Since A is a permutation matrix if and only if A ~ Ip, we have now proven that 
D^{A) > for all A and that D\A) = if and only if A ~ Ip. 

For the minimum value of maxpg-p tr(PA) note that 



maxtr(Pi) > 1 ^ tr(Pi) = ^ ^ ^ 5., = L 

^' P&V ^ i=l j=l 

If maxpgp tr(Py4) = 1, then tr(Py4) = 1 for all permutation matrices P. Since all row sums 
of A are one, if rows ii ^ 12 oi A are different, there have to exist indices ji 7^ j2 such that 

now Pi and P2 denote permutation matrices which 
are identical in all rows i ^ {ii,i2} and in all columns j ^ {ji,j2}, and let the elements 
iiji and 2272 of Pi be equal to one, and let the elements iij2 and i2ji of P2 be equal to one. 
Then tr(PiA) > tr(P2A) contradicting the fact that tr(PA) is identical for all permutation 
matrices P. Hence A ~ IpO^ for some vector a. We have now proven that 1 > D'^{A) for 
all A and that D'^{A) = 1 if and only if A ~ IpO^ for some p- vector a. 

Assume now that aij < l,i 7^ j and let P = Jp + c off (A), where c G [0,1] and let 

B = ipij) = . Then maxptr(PP) = X]i=i ^a^p — a'^.+i - Now clearly maxptr(PP) 

decreases when c increases. This proves that the function c — )■ D'^{Ip + c off (A)) is increasing 
in c G [0, 1] for all matrices A such that A^j < 1, i 7^ j. 



22 



The Proof of Theorem ICT 



Let r(F^) = fl = C{Fx) = Ip and let -y/n vec(f — Ip) -^d Np2{0, S). Let V denote the set 
of all p X p permutation matrices and let C denote the set of all nonsingular p x p diagonal 
matrices. 



We have 



Let 



D = , inf llCr - L\\ = Tnm(mf \\LPT - L 



Pm = argminpgp(inf ||LPr - Ip\\), Lm = arginf^^^ HLP^L - /p||, 



Pm = argminpgp(inf ||LPr - Ip\\), = arginf^g^HLP^r - /, 



PI 



and for all P G P let Lp = arginf^g£||LPf — and Lp = arginf^^g^HLPF — Now for 
all PeV, iLp)u = Bu/ E?=i A' , where P,, = (Pp),, = (Pf),, and (Lp),, = B^/ E?=i 

where Pjj = {Bp)ij = {PT)ij, (see the proof of Lemma 1411) . Let P E V. Since (F — F) — >■ 0, 

^ p _ 

it now follows from the continuous mapping theorem that also (PF — PF) — i- and thus 

(Lp - Lp) 4 0. Since (Lp - Lp) 4 holds for all P G it follows that (An - Pm) ^ 

^ p _ 

and {Lm — L^) — )■ as well. Clearly 



P — T — T 

Since Pm and Pm are discrete, we now have, by using Slutsky's theorem, that 

\/n vec(Lm,Anf - Ip) = \fn vec{Lm - Ip) + a/h vec(f - Ip) + op(l). 



Consider now 

y/n vec{Lm - Ip)- 

Let A = [ciij) = PmT and define diagonal matrices Da = (Daju = an, = (Db)ii = ■ 
Now 

\/n{Lm - Ip) = VniDa - Ip)Db + Vn{Db - Ip) 
and it follows from the convergency of Pm and F that {Da ~ Ip) ^ and {Db — Ip) — ?■ 0. 

Consider now the ii element of the matrix ^/n{Dh — Ip). We have ^/n{ ^p ^ — 1) = 

^Jn Y-p^~-a'^ - It now follows from our assumptions and discreteness of Pm and Pm that each 

y/nciij, i ^ j converges to normal distribution with zero mean. Now each naf^, i ^ j 
converges in distribution to a variable and thus each y/naf^, i ^ j converges in probability 

to zero and {^/n ^p'^\t"' -^/n ^p^i ) = {^/n - ( _i.t"".ij {^/n^an-l)))) converges 

in probability to zero as well. Now since _ y-"p°"a^ = — 2 + op(l), it follows from Slutsky's 
theorem that 

vec(Lm - Ip) = -y/n vec{Da ~ Ip) + op(l). 
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Since (t — Ip) = (P^f — /p) + ((-Pm — -^p)f ), we now have by Slutsky's theorem and discreteness 

^/ndmg(T - Q = ^/ndisig{Pmt - Ip) + op(l). 



of Pm that 



Since 

Vn{Da - Ip) = ^/ndisig{Pmt - Ip) 
we conclude, using Slutsky's theorem again, that 



n vec{LmPmT - Ip) N{0, S2) 



where 



E2 = ASCOViV^ vec(off(f))) = {Ip2 - Dp^p)E{Ip2 - Dp^p) 
with Dp^p = X]r=i(eief) ® (ejcf). Thus 



n 



p — 1 



|off(f)f + Op(l^ 



and it follows from (iTanl . 119771 . Theorem 3.1), that the limiting distribution of nD^ is that of 
(p — 1)~^ Yli=i ^iXl where Xi; x1 independent chi squared variables with one degree 
of freedom, and 61, ...,6k are the k nonzero eigenvalues (including all algebraic multiplicities) 
of E2. 
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